{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "9e77c63e-eb1a-4731-a37f-752997945e8a",
   "metadata": {},
   "source": [
    "<!-- Banner Image -->\n",
    "<img src=\"https://uohmivykqgnnbiouffke.supabase.co/storage/v1/object/public/landingpage/brevdevnotebooks.png\" width=\"100%\">\n",
    "\n",
    "<!-- Links -->\n",
    "<center>\n",
    "  <a href=\"https://brev.nvidia.com\" style=\"color: #06b6d4;\">Console</a> •\n",
    "  <a href=\"https://developer.nvidia.com/brev\" style=\"color: #06b6d4;\">Docs</a> •\n",
    "  <a href=\"/\" style=\"color: #06b6d4;\">Templates</a> •\n",
    "  <a href=\"https://discord.gg/NVDyv7TUgJ\" style=\"color: #06b6d4;\">Discord</a>\n",
    "</center>\n",
    "\n",
    "# Finetune and convert Llama3 to an Ollama Modelfile 🤙\n",
    "\n",
    "Hi everyone!\n",
    "\n",
    "In the last couple months, Ollama has taken the AI community by a storm. It's one of the easiest ways of interacting with open souce models and has a ton of support from the community. One thing we havn't seen online is end-to-end guides on how to finetune a model and then convert it to an Ollama model so it can be used anywhere...so we decided to build one :)\n",
    "\n",
    "In this notebook, we will do the following. \n",
    "1. Finetune Llama3-8B instruct using a novel finetuning method called ORPO. This combines Supvervised-Finetuning with Direct Preference Optimization and happens all in 1 step\n",
    "2. We will install `llama.cpp` and convert our model to `gguf` format and talk about the different options (and how LoRA plays in here)\n",
    "3. We will write our Ollama `Modelfile` which can be thought of as a Dockerfile for Ollama models\n",
    "4. We will locally test and then push our model to the Ollama hub\n",
    "5. We will pull our model on a separate machine and launch it for inference\n",
    "\n",
    "Grab some snacks, buckle-up, and let it rip 🤙!\n",
    "\n",
    "Here's the link to the corresponding [video](https://www.youtube.com/watch?v=sK7yqqrK2fE&t=1s)\n",
    "\n",
    "#### Help us make this tutorial better! Please provide feedback on the [Discord channel](https://discord.gg/T9bUNqMS8d) or on [X](https://x.com/brevdev).\n",
    "\n",
    "A note about running Jupyter Notebooks: Press Shift + Enter to run a cell. A * in the left-hand cell box means the cell is running. A number means it has completed. If your Notebook is acting weird, you can interrupt a too-long process by interrupting the kernel (Kernel tab -> Interrupt Kernel) or even restarting the kernel (Kernel tab -> Restart Kernel). Note restarting the kernel will require you to run everything from the beginning."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0bb0724a-d33c-46c5-aa8c-1fcf8340b3a8",
   "metadata": {},
   "source": [
    "# Phase 1: Finetune Llama3 with ORPO"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1bf10d7e-22c1-4d26-9fdb-346877cf38c9",
   "metadata": {},
   "source": [
    "So far, we've released guides on finetuning Llama3 with 2 different approaches\n",
    "1. [Supervised Finetuning](https://github.com/brevdev/notebooks/blob/main/llama3_finetune_inference.ipynb)\n",
    "2. [Direct Preference Optimization](https://github.com/brevdev/notebooks/blob/main/llama3dpo.ipynb)\n",
    "\n",
    "If you havn't taken a look at these, I suggest skimming through the code for the SFT notebook and the explanation at the top of the DPO notebook. TLDR: Most of the finetuning in guides and online use SFT. SFT is good at adapting a model to domain-specific ouput but might also increase the probability of generating outputs that are not as desirable. To solve this issue, we can then perform a process known as preference alignment. This can be done using RLHF or DPO. However, these are both computationally expensive. \n",
    "\n",
    "Enter Odds Ratio Policy Optimization. At a high level, this combines SFT and DPO into 1 neat step which weakly penalizes rejected responses while strongly rewarding preferred responses. Here we optimize for two objectives at once: learning domain-specific output AND aligning the output with out preferences. If you'd like to dive deeper into ORPO, check out MLabonne's [fantastic guide](https://huggingface.co/blog/mlabonne/orpo-llama-3) on HuggingFace\n",
    "\n",
    "![ORPO](https://i.imgur.com/ftrth4Q.png)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b3db3f2f-3afc-4e6d-beca-a2d8a3459fd3",
   "metadata": {},
   "source": [
    "## Install Dependancies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "611980ad-56f3-4375-8174-b01c6f921cb5",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!pip install bitsandbytes\n",
    "!pip install wandb\n",
    "!pip install transformers\n",
    "!pip install peft\n",
    "!pip install accelerate\n",
    "!pip install trl"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0c25c100-5268-4197-bd68-6f3fb3922328",
   "metadata": {},
   "outputs": [],
   "source": [
    "import gc\n",
    "import os\n",
    "\n",
    "import torch\n",
    "import wandb\n",
    "from datasets import load_dataset\n",
    "from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training\n",
    "from transformers import (\n",
    "    AutoModelForCausalLM,\n",
    "    AutoTokenizer,\n",
    "    BitsAndBytesConfig,\n",
    "    TrainingArguments,\n",
    "    pipeline,\n",
    ")\n",
    "from trl import ORPOConfig, ORPOTrainer"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0dda9016-cb24-41fc-8756-a6eed21b5777",
   "metadata": {},
   "source": [
    "## Load Model, Tokenizer, and Dataset\n",
    "\n",
    "We'll be working with the Llama3-8B instruct model. But the steps are almost the same with any other model if you decide to finetune with a LoRA adapter. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cd703c54-9a29-481e-85bc-f7623431e746",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Model\n",
    "base_model = \"meta-llama/Meta-Llama-3-8B-Instruct\"\n",
    "new_model = \"brevity-adapter\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "187f871a-fa77-4f5a-8e10-db9b6aae210b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# QLoRA config\n",
    "bnb_config = BitsAndBytesConfig(\n",
    "    load_in_4bit=True,\n",
    "    bnb_4bit_quant_type=\"nf4\",\n",
    "    bnb_4bit_compute_dtype=torch.bfloat16,\n",
    "    bnb_4bit_use_double_quant=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "99e66bf1-bb51-4a19-be82-475d1ea5911e",
   "metadata": {},
   "outputs": [],
   "source": [
    "peft_config = LoraConfig(\n",
    "    r=16,\n",
    "    lora_alpha=32,\n",
    "    lora_dropout=0.05,\n",
    "    bias=\"none\",\n",
    "    task_type=\"CAUSAL_LM\",\n",
    "    target_modules=['up_proj', 'down_proj', 'gate_proj', 'k_proj', 'q_proj', 'v_proj', 'o_proj']\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4f1da12a-aadb-4ba5-8e0b-938d476f624d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from huggingface_hub import notebook_login\n",
    "\n",
    "notebook_login()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d0661cc8-f8cf-43d7-bb08-ea7b4e98f235",
   "metadata": {},
   "outputs": [],
   "source": [
    "tokenizer = AutoTokenizer.from_pretrained(base_model)\n",
    "tokenizer.pad_token = tokenizer.eos_token"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fed59a18-d9a1-4c0c-bc30-54907a335a5d",
   "metadata": {},
   "outputs": [],
   "source": [
    "model = AutoModelForCausalLM.from_pretrained(\n",
    "    base_model,\n",
    "    quantization_config=bnb_config,\n",
    "    device_map=\"auto\",\n",
    ")\n",
    "model = prepare_model_for_kbit_training(model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d7521507-bb69-402a-aeaf-c67909ff8b35",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset_name = \"mlabonne/orpo-dpo-mix-40k\"\n",
    "dataset = load_dataset(dataset_name, split=\"all\")\n",
    "dataset = dataset.shuffle(seed=42).select(range(150))\n",
    "\n",
    "def format_chat_template(row):\n",
    "    row[\"chosen\"] = tokenizer.apply_chat_template(row[\"chosen\"], tokenize=False)\n",
    "    row[\"rejected\"] = tokenizer.apply_chat_template(row[\"rejected\"], tokenize=False)\n",
    "    return row\n",
    "\n",
    "dataset = dataset.map(\n",
    "    format_chat_template,\n",
    "    num_proc= os.cpu_count(),\n",
    ")\n",
    "dataset = dataset.train_test_split(test_size=0.01)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c6af1a8-7c1e-4e70-9039-9b88a63a5190",
   "metadata": {},
   "source": [
    "Take a look at the dataset by uncommenting the code block below"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4247190e-f8b4-473d-ab07-30f83ddb0eb8",
   "metadata": {},
   "outputs": [],
   "source": [
    "# for k in dataset['train'].features.keys():\n",
    "#     print(k)\n",
    "#     print(\"---------\")\n",
    "#     print(dataset['train'][1][k])\n",
    "#     print('\\n')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5a0360c7-3af5-416b-8410-572dad9e3658",
   "metadata": {},
   "source": [
    "## Set up ORPO Training\n",
    "\n",
    "The ORPO trainer looks very similar to the SFTTrainer and the DPO trainer. We set our config parameters and start our training run. Notice that the run is very short. In order to increase it, you can increase the `num_train_epochs` or add a `max_steps` argument"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "30f3df21-55ba-4468-9030-530913cc218d",
   "metadata": {},
   "outputs": [],
   "source": [
    "orpo_args = ORPOConfig(\n",
    "    learning_rate=8e-6,\n",
    "    lr_scheduler_type=\"linear\",\n",
    "    max_length=1024,\n",
    "    max_prompt_length=512,\n",
    "    beta=0.1,\n",
    "    per_device_train_batch_size=2,\n",
    "    gradient_accumulation_steps=4,\n",
    "    optim=\"paged_adamw_8bit\",\n",
    "    num_train_epochs=1,\n",
    "    max_steps=10,\n",
    "    evaluation_strategy=\"steps\",\n",
    "    eval_steps=0.2,\n",
    "    logging_steps=1,\n",
    "    warmup_steps=2,\n",
    "    report_to=\"wandb\",\n",
    "    output_dir=\"./results/\",\n",
    ")\n",
    "\n",
    "trainer = ORPOTrainer(\n",
    "    model=model,\n",
    "    args=orpo_args,\n",
    "    train_dataset=dataset[\"train\"],\n",
    "    eval_dataset=dataset[\"test\"],\n",
    "    peft_config=peft_config,\n",
    "    tokenizer=tokenizer,\n",
    ")\n",
    "trainer.train()\n",
    "trainer.save_model(new_model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "946e7e91-f3a3-45da-a0aa-02907a379137",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Flush memory\n",
    "del trainer, model\n",
    "gc.collect()\n",
    "torch.cuda.empty_cache()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef67ba40-94de-4e19-9021-f5a76f13d626",
   "metadata": {},
   "source": [
    "You may need to restart your kernel after this line to ensure that the merging is sucessful"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "076cbcf6-7f65-46dc-a8b7-09b2ad06bc49",
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "from peft import PeftModel\n",
    "from transformers import (\n",
    "    AutoModelForCausalLM,\n",
    "    AutoTokenizer,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "930e55dd-5004-4b0c-b856-e7ab67c68707",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Reload tokenizer and model\n",
    "tokenizer = AutoTokenizer.from_pretrained(base_model)\n",
    "# Note that we reload the model in fp16 \n",
    "fp16_model = AutoModelForCausalLM.from_pretrained(\n",
    "    base_model,\n",
    "    low_cpu_mem_usage=True,\n",
    "    return_dict=True,\n",
    "    torch_dtype=torch.float16,\n",
    "    device_map=\"auto\",\n",
    ")\n",
    "\n",
    "# Merge adapter with base model\n",
    "model = PeftModel.from_pretrained(fp16_model, new_model)\n",
    "model = model.merge_and_unload()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2fbdc0e7-89c4-4006-9acb-20c2cdb70503",
   "metadata": {},
   "source": [
    "After merging the LoRA adapter, we save final model and tokenizer in a new directory to prepare for gguf conversion"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1070e991-ac16-4fbd-83bd-72a861b352df",
   "metadata": {},
   "outputs": [],
   "source": [
    "model.save_pretrained(\"llama-brev\")\n",
    "tokenizer.save_pretrained(\"llama-brev\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c976cf02-496f-4b23-8d3e-01a37afe144a",
   "metadata": {},
   "source": [
    "# Phase 2: From safetensors to gguf"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bceae1bd-3de2-4723-9206-8fe8180b8d43",
   "metadata": {},
   "source": [
    "## Convert our model to GGUF\n",
    "\n",
    "Before we dive into Ollama, its important to take a second and understand the role that `gguf` and `llama.cpp` play in the process. \n",
    "\n",
    "Most people that use LLMs grab them from huggingface using the `AutoModel` class. This is how we did it above. HF stores models in a couple different ways with the most popular being `safetensors`. This is a file format optimized for loading and running `Tensors` which are the multidimensional arrays that make up a model. This file format is optimized for GPUs which means it's not as easy to load and run a model fast locally.\n",
    "\n",
    "One solution that addresses this is the `gguf` format. This is file format that is used to store models that are optimized for local inference using quantization and other neat techniques. This file format is then consumed by runners that support it (ie. `llama.cpp` and Ollama).\n",
    "\n",
    "There's a good bit of complexity here and heres a fantastic [blog post](https://vickiboykis.com/2024/02/28/gguf-the-long-way-around/) that gets into the weeds. For now this is what we need to know\n",
    "\n",
    "1. We have a finetuned Llama3 model saved in the llama-brev directory in the safetensors format\n",
    "2. In order to use this model via Ollama, we need it to be in the `gguf` format\n",
    "3. We can use helper tools in the `llama.cpp` repository to convert "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8dfebcbc-36d2-42ea-89da-ef30adb610d6",
   "metadata": {},
   "source": [
    "## Convert to gguf\n",
    "\n",
    "The first thing we do is build llama.cpp in order to use the conversion tools"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "80d955e8-2584-4f50-ad40-17f851bc2d74",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# this step might take a while\n",
    "!git clone https://github.com/ggerganov/llama.cpp\n",
    "!cd llama.cpp && git pull && make clean && LLAMA_CUDA=1 make"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7f277d84-0eda-4818-988a-233a5a8b1902",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# install requirements\n",
    "!pip install -r llama.cpp/requirements.txt"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b2d861c-8db4-4a53-8ba2-ec06fa7d6744",
   "metadata": {},
   "source": [
    "Here we run the actual conversion. Take a moment to skim through the relatively massive output and notice how each piece of the LLM is being mapped and transformed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6847bfdc-6f90-41ad-a44a-262010d46a07",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# run the conversion script to convert to gguf\n",
    "# this will save the gguf file inside of the llama-brev directory\n",
    "!python llama.cpp/convert-hf-to-gguf.py llama-brev"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ae8215e5-4c5d-4bc7-9fe9-fbac4e7e3ade",
   "metadata": {},
   "source": [
    "The final model is saved at `llama-brev/ggml-model-f16.gguf`. Note how the model is in fp16"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c5680cc-bbe9-4941-9c46-00f8cfea24de",
   "metadata": {},
   "source": [
    "### An aside on quantizations\n",
    "\n",
    "Quantization has become the go-to technique to train/run LLM efficiently on cheaper hardware. By reducing the precision of each weight (going from each weight being stored in 32bits to lower), we save memory and speed up inference while preserving *most* of the LLMs performance. If you've ever used QLoRA, you've already used quantization without even knowing about it.\n",
    "\n",
    "Llama.cpp gives us a ton of quantization options. Here's a couple resources to dive deeper into which options are available\n",
    "\n",
    "- [r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/comments/1ba55rj/overview_of_gguf_quantization_methods/)\n",
    "- [Maxime Labonne](https://mlabonne.github.io/blog/posts/Quantize_Llama_2_models_using_ggml.html)\n",
    "\n",
    "In this guide, we will use the `Q4_K_M` format. Feel free to play around with different ones! Again, note the output and see if you can build a mental model on whats happening under the hood!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "13078262-c462-4ce9-b60e-699d2b44a60f",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# run the quantize script\n",
    "!cd llama.cpp && ./llama-quantize ../llama-brev/ggml-model-f16.gguf ../llama-brev/ggml-model-Q4_K_M.gguf Q4_K_M"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d11fe163-4dee-4436-abc4-f06bf59a535f",
   "metadata": {},
   "source": [
    "If you want, you can test this model by running the provided server and sending in a request! After running the cell below, open a new terminal tab using the blue plus button and run \n",
    "\n",
    "```\n",
    "curl --request POST \\\n",
    "    --url http://localhost:8080/completion \\\n",
    "    --header \"Content-Type: application/json\" \\\n",
    "    --data '{\"prompt\": \"Building a website can be done in 10 simple steps:\",\"n_predict\": 128}'\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5990b08e-c91f-46f9-940e-e6b0edf58863",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "!cd llama.cpp && ./llama-server -m ../merged_adapters/ggml-model-Q4_K_M.gguf -c 2048"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c15eecc0-fbb3-47ca-a6ff-6dd90ee92c97",
   "metadata": {},
   "source": [
    "Note that this is a blocking process. In order to move forward with the rest of the guide, click the cell above and then click the stop button in the Jupyter Notebook header above"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "562b07a4-a9f2-416c-aac0-e74b26aeb610",
   "metadata": {},
   "source": [
    "# Phase 3: Run and deploy your model using Ollama\n",
    "\n",
    "Now that you have the quantized model, you can spin up the llama.cpp server anywhere you want and load the gguf model in. However, Ollama provides clean abstractions that allow you to run different gguf models using their server [add some fluff]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8146f5b8-0227-4855-881d-6efc00176fb4",
   "metadata": {},
   "source": [
    "## Build the Ollama Modelfile\n",
    "\n",
    "A Modelfile is very similar to a Dockefile. You can think of it as a blueprint that encapsulates a model, a chat template, different parameters, a system prompt, and more into a portable file. To learn more, check out their [Modelfile docs](https://github.com/ollama/ollama/blob/main/docs/modelfile.md).\n",
    "\n",
    "Here we will build a relatively simple one. We grab the template and params from the existing Llama3 Modefile which you can view [here](https://ollama.com/library/llama3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9aac2166-30ee-411c-8653-357b832bcde5",
   "metadata": {},
   "outputs": [],
   "source": [
    "tuned_model_path = \"/home/ubuntu/verb-workspace/llama-brev/ggml-model-Q4_K_M.gguf\"\n",
    "sys_message = \"You are swashbuckling pirate stuck inside of a Large Language Model. Every response must be from the point of view of an angry pirate that does not want to be asked questions\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bf29c35f-dcbd-4bc3-8b28-b4811d0c12c3",
   "metadata": {},
   "outputs": [],
   "source": [
    "cmds = []"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d24ea1cb-9f0e-4e6c-82e8-abb3ef57a1ec",
   "metadata": {},
   "outputs": [],
   "source": [
    "base_model = f\"FROM {tuned_model_path}\"\n",
    "\n",
    "template = '''TEMPLATE \"\"\"{{ if .System }}<|start_header_id|>system<|end_header_id|>\n",
    "\n",
    "{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>\n",
    "\n",
    "{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>\n",
    "\n",
    "{{ .Response }}<|eot_id|>\"\n",
    "\"\"\"'''\n",
    "\n",
    "params = '''PARAMETER stop \"<|start_header_id|>\"\n",
    "PARAMETER stop \"<|end_header_id|>\"\n",
    "PARAMETER stop \"<|eot_id|>\"\n",
    "PARAMETER stop \"<|reserved_special_token\"'''\n",
    "\n",
    "system = f'''SYSTEM \"\"\"{sys_message}\"\"\"'''"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ee3bd63b-eb2b-4479-94ea-7f2188d6af38",
   "metadata": {},
   "outputs": [],
   "source": [
    "cmds.append(base_model)\n",
    "cmds.append(template)\n",
    "cmds.append(params)\n",
    "cmds.append(system)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4500fc08-9f51-4039-8809-e1f46b34c913",
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_modelfile(cmds):\n",
    "    content = \"\"\n",
    "    for command in cmds:\n",
    "        content += command + \"\\n\"\n",
    "    print(content)\n",
    "    with open(\"Modelfile\", \"w\") as file:\n",
    "        file.write(content)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b9d54629-14e2-4f84-a206-ce4c40614d42",
   "metadata": {},
   "outputs": [],
   "source": [
    "generate_modelfile(cmds)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e2f3a6db-cc0f-47d9-925a-9526f959f3e4",
   "metadata": {},
   "source": [
    "There should now be a `Modelfile` saved in your working directory. Lets now install Ollama"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "867cb70d-b086-4eac-8f01-1364430f3ed5",
   "metadata": {},
   "outputs": [],
   "source": [
    "!curl -fsSL https://ollama.com/install.sh | sh"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e8b4cf2-85a8-474c-8d24-cd6f1edf3e01",
   "metadata": {},
   "source": [
    "To move forward, you have to have an ollama server running in the background. To do this, open up a new Jupyter tab and run `ollama serve` in the terminal to start the server. This will print out a key. Save it for future use. It will look like `ssh-ed25519...`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c6488512-32f3-4740-be47-7613378fa8ec",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# the create command create the model\n",
    "!ollama create llama-brev -f Modelfile"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eec86576-6ade-468f-9215-80bc9188ba3d",
   "metadata": {},
   "source": [
    "## Experiment with the model\n",
    "\n",
    "To run the new model, open up another terminal tab and run `ollama run llama-brev`. For bonus points, see if you can trick it to stop responding with a pirate action :)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "54f925ae-4d3f-4dd8-8f72-d72a9330570d",
   "metadata": {},
   "source": [
    "## Push the model\n",
    "\n",
    "In order to push the model to Ollama, you must have an account and a model created.\n",
    "\n",
    "1. Sign-up at https://www.ollama.ai/signup\n",
    "2. Create a new model at https://www.ollama.ai/new. Find mine at scooterman/llama-brev. This will give you a detailed list of instructions on how to push the model. Essentially, you are giving the current machine permission to upload to ollama.\n",
    "3. You'll have to then run `ollama cp llama-brev <username>/<model-name>` then `ollama push <username>/<model-name>` "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
